Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

ISCB-LA SoIBio BioNetMX 2020 | Oct 28 – 29, 2020 | Virtual Symposium | Symposium Programme

ISCB-LA SoIBio BioNetMX Symposium 2020 Virtual Viewing Hall

Presentation 04: Bioinformatics modern challanges: from genes to ecosystems

Show
Keywords: Keynote
  • Alicia Mastretta-Yanes, Mexico

Short Abstract: Alicia Mastretta-Yanes studied biology at UNAM and performed her PhD on evolutionary biology at the University of East Anglia, UK. Currently she is a CONACYT Research Fellow at CONABIO, México. Her research focuses on the microevolutionary processes shaping Mexican biodiversity. This inclues from the effect of topography and past climate fluctuations, to the present implications of domestication and human managment. In 2020 she won a L'Oréal–UNESCO-AMC Fellowship for Women in Science. She likes plants, and keeps accumulating them at home, even if there is no more space. She is an external tutor at the Biological Sciences and Biomedical Sciences Postgrades at UNAM, where she also teaches bioinformatics for biologists.

Video not uploaded

Presentation 16: Ancestry Packages: Merging uniparental and autosomal genetic histories

Show
Keywords: autosomal DNA variation MSY mtDNA Ancestry packages Sex-biased gene flow ADMIXTURE population genetics
  • Vladimir Bajić, Max Planck Institute for Evolutionary Anthropology, Germany
  • Mark Stoneking, Max Planck Institute for Evolutionary Anthropology, Germany

Short Abstract: Mitochondrial DNA (mtDNA) and the male-specific region of the Y chromosome (MSY) are commonly used uniparental markers in population genetics that provide information on the history and relationships of populations and individuals. Genetic profiles of a population inferred from mtDNA vs. the MSY often differ from each other, and from the genetic profile inferred from autosomal markers, due to differences in the maternal and paternal histories of human populations. Recently, many populations have been described using both uniparental and autosomal markers, however little is known about associations of uniparental haplogroups with autosomal ancestry components. Here we synergistically use mtDNA and MSY haplogroup compositions together with autosomal ancestry components to define “ancestry packages”, i.e. associated combinations of specific mtDNA and MSY haplogroups with specific autosomal ancestry components, which can be indicative of ancestral genetic compositions. Our results show that i) uniparental haplogroups are highly associated with autosomal ancestry components, suggesting the existence of ancestry packages; ii) ancestry packages can be used to objectively classify the likely geographic origin of haplogroups (and other markers for which population estimates are available) in accordance with autosomal ancestry components; iii) ancestry packages can provide information about the potential direction and composition of sex-biased gene flow between different putative ancestral populations.

Video not uploaded

Presentation 17: Integrated synteny- and similarity-based inference on the polyploidization-fractionation cycle

Show
Keywords: whole genome duplication fractionation flowering plants branching process synteny gene pair similarity distribution evolution comparative genomics
  • Yue Zhang, University of Ottawa, Canada
  • Zhe Yu, University of Ottawa, Canada
  • Chunfang Zheng, University of Ottawa, Canada
  • David Sankoff, University of Ottawa, Canada

Short Abstract: Whole genome doubling, tripling or higher multiplying (WGD), due to fixation of polyploidization events, is attested in almost all lineages of the flowering plants, recurring in the ancestry of some plants two, three or more times in retracing their history to the earliest angiosperm. This major mechanism in genome evolution, which generally appears as instantaneous on the evolutionary time scale, sets in operation a compensatory process called fractionation, the loss of duplicate genes, initially rapid, but continuing over millions and tens of millions of years. We study this process by statistically comparing the distribution of duplicate gene pairs as a function of their time of creation, as measured by sequence similarity. The stochastic model for accounting for this distribution, though exceedingly simple, still has too many rate parameters to be estimated based only on the similarity distribution, while the computational procedures for compiling the distribution from annotated genomic data is heavily biased against earlier polyploidization events - syntenic ""crumble"". Other parameters, such as the size of the initial gene complement and the ploidy of the various events giving rise to duplicate gene pairs, are even more inaccessible. Here we show how the frequency of unpaired genes, identified via their embedding in stretches of duplicate pairs, together with previously established constraints among some parameters, adds enormously to the range of successive polyploidization events that can be analyzed. This also allows us to extimate initial gene complement and to correct for the bias due to crumble. We also discuss how to determine ploidy through recourse to similar gene triples deduced from the duplicate gene data. Finally we explore the applicability of our methodology to four flowering plant genomes covering a range of different polyploidization histories.

Video not uploaded

Presentation 18: Reading the book of life: the language of proteins

Show
Keywords: Evolution language complexity entropy
  • Malay Basu, University of Alabama, Birmingham, United States

Short Abstract: Background Genomes are remarkably similar to natural language texts. From an information theory perspective, we can think of amino acid residues as letters, protein domains as words, and proteins as sentences consisting of ordered arrangements of protein domains (domain architectures). This work describes our recent efforts towards understanding the linguistic properties of genomes. Results Our recent work showed that the complexity of “grammars” in all major branches of life is close to a universal constant of ~1.2 bits. This is remarkably similar to natural languages; such an--yet unexplained--universal information gain has been observed and generally used to determine whether a series of symbols represent a language. In this work, we describe the implications of this work and its extension in various areas with a particular emphasis on measuring the proteome complexities in human tissues. Conclusion Our work established the similarity between natural languages and genomes and showed, for the first time, that there exists a “quasi-universal grammar” of protein domains and measured the minimal complexity of proteome required for a functional cell. We also describe the proteome complexities in human tissues and their functional significance.

Video not uploaded

Presentation 19: Review of HLA frequencies by country allowed immunoinformatic prediction of SARS-CoV-2 candidate epitopes specific for South America

Show
Keywords: immunoinformatics epitope allele frequency HLA literature review South America SARS-CoV-2 COVID-19
  • David Requena, Rockefeller University, United States
  • Aldhair Medico, Cayetano Heredia Peruvian University (Universidad Peruana Cayetano Heredia), Peru
  • Ruy D. Chacón, Sao Paulo University (Universidade de São Paulo), Brazil
  • Manuel Ramírez, San Marcos University (Universidad Nacional Mayor de San Marcos), Peru
  • Obert Marín-Sánchez, San Marcos University (Universidad Nacional Mayor de San Marcos), Peru

Short Abstract: SARS-CoV-2 is the causing agent of the COVID-19 pandemic. South America is the most affected region per capita, suffering more than 6 million cases and 200,000 deaths as of August 2020. Numerous ongoing efforts to control the disease include the development of peptide-based immunodiagnostic tests and vaccines. This requires knowledge about allele frequencies of the HLA system. The largest repository of HLA frequencies is the Allele Frequency Net Database (AFNDB), widely used by researchers worldwide. However, it has a passive data collection strategy, relying on the researchers to upload their studies’ data. This results in under-representation of many countries, showing only few studies for South America. To address this problem, we enriched the current scenario with an extensive review of studies reporting HLA frequencies of South American populations. Studies available in PubMed from 1990 onwards, genotyping HLA alleles with 4-digit resolution, were selected. As result, we obtained more than 12 million new datapoints. We combined the datasets selected per country (matching technology and nomenclature), calculating weighted average frequencies per allele. This is summarized in the first integrated map of HLA allelic frequencies of South America. Both the methodology and information collected are presented in full detail to guarantee reproducibility. Then, using the most frequent South American HLA alleles (weighted frequency >5%), linear T-cell epitopes were predicted in SARS-CoV-2 proteins. We used the state-of-the-art prediction software based on artificial neural networks: NetMHCpan-v4.0 and MHCflurry-v1.6.0 (for HLA-II) and NetMHCIIpan-v4.0 (for HLA-II). Predicted Class-I and Class-II peptides were selected according to their binding to South American alleles. Class-II peptides were also filtered according to their three-dimensional accessibility. We selected 27 HLA-I and 34 HLA-II candidate epitopes, from which 14 and 4 (respectively) have experimental evidence in other coronaviruses, reported in the Immune Epitope Database and Analysis Resource (IEDB). Recent similar studies have presented SARS-CoV-2 candidate epitopes based on its similarity with experimentally-detected epitopes of SARS. They attempted worldwide coverage, using either the most frequent HLA supertypes or the IEDB population tool (based on the AFNDB information). Here, we show that this resulted in poor coverage for South America. Therefore, our study provides valuable information for regional epitope-based strategies against SARS-CoV-2. Additionally, updated HLA frequencies provide a better representation of South America and could be useful in various immunogenetic studies of different diseases, such as infectious and autoimmune diseases, cancer and anti-tumor immune response, organ transplants, among others.

Video not uploaded

Presentation 20: Vesicular glutamate transporter (VGLUT) genes: origin, evolution and molecular signatures underpinning glutamate transport in animals

Show
Keywords: Vesicular glutamate transporters (VGLUTs) Excitatory synapsis Endocytic motifs gene loss Lineage-specific duplications Molecular signatures.
  • Nicolas Zuniga, Universidad de Concepcion, Chile
  • Patricio Castro, Universidad de Concepcion, Chile
  • Felipe Aguilera, Universidad de Concepcion, Chile

Short Abstract: VGLUT genes play essential roles in excitatory synapsis transmission by concentrating glutamate into presynaptic vesicles. In vertebrates, these genes comprise three highly homologous proteins (VGLUT1-3), which are encoded by solute carrier genes SLC17A6-8, and are expressed mainly in glutamatergic neurons in the neurocortex, hippocampus, and hypothalamus. Although these genes are evolutionarily conserved in vertebrates, their origin and early evolution are not well understood. Here, we performed a thorough phylogenetic and structural analyses, spanning 110 eukaryotic species, to show that VGLUT is closely related to Sialin (SLC17A5) rather than to other SCL17 family members. We also revealed two distinct phylogenetic clades of VGLUT genes, one comprising vertebrates and the other comprising ambulacralians, protostomes, xenacoelomorphs, and cnidarians. In addition, we discovered a new clade of invertebrate phosphate transporters that is closely-related to VGLUT and Sialin rather than phosphate transporters (SLC17A1-4). The evolution of VGLUT genes in modern animal lineages is typified by the loss of one or more members in vertebrates and lineage-specific duplications in some invertebrate species. Vertebrates VGLUTs shows the three critical regions associated with vesicular endocytosis and recycling, such as the transmembrane glutamate-binding region, the di-leucine-like motifs, and the proline-rich domain at the C-terminal domain. Comparative analyses revealed that glutamate binding residues and di-leucine-like motifs emerge at the dawn of bilaterian animals, with the subsequent loss of di-leucine-like motifs in hemichordates. Furthermore, structural comparisons showed that VGLUT1 in mammals has a double proline-rich domain at the C-terminal, which is absent in the mammalian VGLUT2-3 and invertebrate VGLUTs. Altogether, this study reveals the origin and evolution of VGLUT genes but also uncovers molecular signatures and endocytic motifs as a key driver in the evolution of glutamate transport in animals.

Video not uploaded



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube